[codex] Trace cancelled inference streams#19839
Conversation
35a8e4f to
31de9a8
Compare
31de9a8 to
c5166ee
Compare
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
codex/codex-rs/core/src/client.rs
Lines 1664 to 1669 in c5166ee
In map_response_stream, a closed tx_event causes an early return (e.g. after OutputItemDone) before any terminal trace call. If the consumer drops between select! and send, the attempt never records InferenceCancelled, so collected items_added are lost and the inference is only force-closed later at turn end without partial output evidence.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
jif-oai
left a comment
There was a problem hiding this comment.
Can we make the mapper own one “finish once” path for every non-completed stream exit? and keep the reducer fallback as a last-resort repair only?
| while let Some(event) = api_stream.next().await { | ||
| loop { | ||
| let event = tokio::select! { | ||
| _ = consumer_dropped.cancelled() => { |
There was a problem hiding this comment.
This only records cancellation while the mapper is waiting on api_stream.next(). If the consumer is dropped while we’re blocked in one of the tx_event.send(...).await paths, we loose it. So an interrupted stream can lose what we're trying to do here
There was a problem hiding this comment.
Yep, good catch. The token only covers the api_stream.next() wait. I added explicit cancellation recording on the tx_event.send(...).await error paths too, so if the receiver is dropped while a send is blocked we still close the inference as cancelled and keep the items observed so far.
Yes. I updated the stream mapper so current streams should now close themselves explicitly instead of depending on reducer repair.
API/provider stream error records InferenceFailed, carrying any completed output items observed before the error. |
Records cancelled inference streams when Codex stops consuming a provider response before
response.completed, preserving complete output items observed before cancellation.Also closes still-running inference calls when the owning turn ends, so reduced rollout traces do not leave stale
Runninginference nodes.Covered by focused reducer coverage and a core stream-drop test for partial output preservation.